Medical Decision Making — Latest Matching Preprints

1

The Inflation Reduction Act's Impact Upon Late-Stage R&D

Bowen, H. P.; O'Loughlin, G.; Schleicher, C.; Schulthess, D.

2026-05-28 health economics 10.64898/2026.05.20.26353648 medRxiv

Top 0.2%

0.9%

Show abstract

Background: The impact of the Inflation Reduction Act (IRA) upon late-stage developments has been assumed to be limited. The Congressional Budget Office's IRA analysis excluded post-approval innovation, potentially overlooking substantial economic risks to drug developers and declines in the availability of treatments in areas of high unmet medical need such as oncology. Methods: A total of 1148 secondary trials from 364 FDA-approved medicines, published from 2018 to 2025, were obtained from Biomedtracker and clinicaltrials.gov. Using fractional multinomial logit, we model the share distribution of secondary indication studies across 19 disease groups and assess the change in this distribution post-IRA. We also assessed the number of secondary treatment studies pre- vs. post-IRA using multiple linear regression. Results: After the IRA's introduction, small molecule follow-on studies in oncology exhibited a statistically significant 35% decline (R2 = .48, p < 0.014) and lead indication small molecule oncology approvals exhibited a statistically significant 27% decline (R2 = .70, p < 0.002). We also find a statistically significant 14% decline in the share of orphan oncology studies pre- vs. post-IRA (p<0.001). Research Conclusions: This study's results refute claims that the IRA would have minimal negative effects on patient access or late-stage biopharmaceutical R&D. We hope this study reinvigorates debate about the law's unintended consequences and encourages thoughtful policy solutions, as the IRA manifestly creates disincentives that negatively impact patients seeking needed new medicines, particularly those requiring cures addressing metastatic late-stage cancers.

2

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv

Top 0.2%

0.8%

Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

3

Use of large language models by academic hospitalists: results of a multicenter survey

Bressman, E.; Auerbach, A.; Keniston, A.; Jens, C.; Ranji, S.

2026-05-29 health systems and quality improvement 10.64898/2026.05.27.26353610 medRxiv

Top 0.4%

0.5%

Show abstract

Introduction: The use of artificial intelligence (AI) by clinicians has increased rapidly in recent years, with large language models (LLMs) emerging as tools that can equal clinician diagnostic performance in simulated settings. However, limited data exist regarding physicians use of LLMs in real-world clinical practice. This study aimed to evaluate the frequency of LLM use among practicing hospitalists, identify which LLMs are most commonly utilized, and assess hospitalists' perceptions of the benefits and limitations of LLM use in clinical care. Methods: We conducted a cross-sectional survey study of academic hospital medicine faculty across 8 institutions within the Hospital Medicine Reengineering Network (HOMERuN), a collaborative research consortium. Eligible participants included hospitalists practicing within participating HOMERuN sites during the study period. The survey assessed the frequency of LLM use, types of LLMs used, clinical applications, and physician perceptions regarding usefulness, efficiency, and concerns associated with LLM adoption. Results: 170 respondents (67.1%) reported ever using an LLM in clinical practice. Among LLM users, OpenEvidence was the most used tool (88.9%), followed by ChatGPT (58.5%), Google Gemini (26.9%), and Microsoft Copilot (20.5%). Only a minority of hospitalists reported using LLMs daily while seeing patients. The most common use cases of LLMs were answering diagnostic (77.1%) and management (77.6%) questions. A majority also reported using LLMs to identify or summarize primary literature (60.0%). Lack of trust in outputs (49.8%), uncertainty around institutional policies (48.6%), and lack of access to secure applications (43.1%) were cited as the most frequent barriers to using LLMs in practice. Discussion: The use of LLMs in clinical practice is already widespread, though regular or daily use is not yet typical. Concerns regarding reliability, patient privacy, and safe integration into clinical workflows remain significant barriers to broader adoption. The responsible implementation of LLMs in hospital medicine will require addressing these barriers.

4

Same household, different choices: variation in health behaviors related to respiratory viruses in Illinois

Larsen, S. L.; Yang, J.; Haslett, E. M.; Anastasi, A.; Venegas, A.; Schieleit, L.; Mahmud, A.; Martinez, P. P.

2026-05-28 public and global health 10.64898/2026.05.26.26354179 medRxiv

Top 0.5%

0.4%

Show abstract

While SARS-CoV-2 and influenza continue to place a significant burden on population health, within-household differences in decisions towards vaccination and seeking care across these two pathogens, and across sociodemographic groups, remain largely unexplored. By conducting a household-level survey in Illinois, we found that many individuals made inconsistent decisions about vaccination: among all adults, 29% were vaccinated for only one of COVID-19 or influenza, and among those with children in the home, 39% lived with a child whose influenza or COVID-19 vaccination status differed from their own. A higher proportion of adults were vaccinated against COVID-19 compared to influenza, while the opposite was true for those younger than 18 years old. These differences hold even when accounting for disparities in coverage by age, race/ethnicity, political affiliation, and socioeconomic status. While vaccinated individuals consistently reported wanting to protect themselves or others, those who declined vaccination reported highly heterogeneous reasons ranging from resource constraints to distrust or misconceptions about vaccination. These differences are even more pronounced for COVID-19, with larger partisan gaps and higher refusal driven by safety concerns, lack of trust, or religious reasons than those who decide not to get the influenza vaccine. In contrast to vaccination, the decision to seek medical care when sick showed opposite sociodemographic trends, that are likely attributable to illness severity. Our findings highlight that closing gaps in COVID-19 and influenza vaccination coverage will require an integrative strategy that accounts for diverse motivations, fears, and barriers to access, while addressing social inequalities common to both diseases.

5

Explaining socioeconomic inequalities in antibiotic prescribing for common infections in English primary care: a population-based study

Yang, M.; Nguyen, V. N.; Walker, A. S.; Robotham, J. V.; van Leeuwen, E.; Hayward, G.; Butler, C. C.; Pouwels, K. B.

2026-05-27 health economics 10.64898/2026.05.26.26354118 medRxiv

Top 0.7%

0.3%

Show abstract

OBJECTIVES To quantify socioeconomic inequalities in antibiotic prescribing for common infections in primary care, and assess whether these inequalities arise from differences in consultation frequency, prescribing behaviour, or variation in vaccination uptake, smoking, and body mass index. DESIGN Population based cohort study. SETTING Primary care data from Clinical Practice Research Datalink, England. PARTICIPANTS 17,195,399 children and adults estimated to have been registered with a general practice in 2019. MAIN OUTCOME MEASURES Antibiotic prescribing rates (prescriptions per person-year), consultation rates (consultations per person-year), and probability of receiving an antibiotic prescription following consultation. RESULTS Higher deprivation was associated with higher antibiotic prescribing rates for most respiratory tract indications. In children, prescribing rates were 44.8% (95% confidence interval [CI] 41.9% to 47.7%) higher for upper respiratory tract infections and 47.6% (95% CI 44.2% to 51.3%) higher for lower respiratory tract infections in the most versus least deprived twentile. In adults, prescribing rates for lower respiratory tract infections were 22.7% (95% CI 21.4% to 24.1%) higher in the most deprived twentile. Prescribing rates for other indications showed weak, U-shaped, or negative associations with deprivation. Prescribing inequalities were primarily driven by inequalities in consultation rates rather than probability of receiving antibiotics once consulted. Lower influenza vaccination uptake partly accounted for higher consultation rates for respiratory infections among more deprived children, while smoking prevalence contributed to inequalities among adults. CONCLUSIONS Socioeconomic inequalities in antibiotic prescribing vary by indication type and are largely explained by consultation frequency. Reducing inequalities may require interventions that decrease the need to consult, e.g. improving influenza vaccination coverage in children and reducing smoking among adults, rather than focussing solely on prescribing behaviour.

6

Willingness to pay for improved long-term care insurance among beneficiaries or primary family caregivers in a Chinese pilot city: A contingent valuation study

Cao, H.; Li, X.; Cao, Z.

2026-06-01 health economics 10.64898/2026.05.28.26354309 medRxiv

Top 0.7%

0.3%

Show abstract

Background Chinas rapidly ageing population has increased the demand for long-term care insurance (LTCI), while the sustainability of current financing arrangements remains uncertain. Understanding willingness to pay (WTP) for improved LTCI services among LTCI beneficiaries or primary family caregivers may provide empirical evidence for discussions on acceptable and sustainable contribution mechanisms. Methods We conducted a contingent valuation survey among 278 LTCI beneficiaries or primary family caregivers in Panjin City, Liaoning Province, China. An iterative bidding game with randomized starting bids was used to elicit monthly WTP for a predefined LTCI service improvement scenario. Tobit regression models with heteroskedasticity-robust standard errors were used to estimate factors associated with WTP, including household income, disability severity, satisfaction with current services, and demographic characteristics. Results The mean monthly WTP for improved LTCI services was approximately CNY 300, compared with the current average monthly premium of approximately CNY 120. The median WTP was CNY 250. Higher household income was positively associated with WTP. Compared with participants with monthly household income below CNY 5,000, those in the highest income group above CNY 30,000 reported an additional WTP of CNY 178.9. More severe disability was also associated with higher WTP, whereas greater satisfaction with current LTCI services was associated with lower WTP. These associations were generally consistent across alternative model specifications. Conclusions LTCI beneficiaries or primary family caregivers in this Chinese pilot city reported a willingness to contribute more for improved LTCI services, particularly among those with higher income, greater care needs, or lower satisfaction with current services. These findings may inform discussions on differentiated contribution arrangements and service quality improvements in LTCI financing reform. However, the results should be interpreted cautiously because the study was conducted in a single pilot city and relied on stated-preference data.

7

Coaching for quality improvement under performance-based contracting: a theory-of-change evaluation in Honduras

Munar, W. J.; Aranda, L. E.; Lauria, M. E.; Bernal Lara, P.; Innocenti, C.; Rodriguez, M.

2026-05-30 health systems and quality improvement 10.64898/2026.05.21.26353487 medRxiv

Top 0.8%

0.2%

Show abstract

Introduction. Practice coaching is increasingly used to strengthen quality improvement (QI) capacity in primary healthcare (PHC) systems in low and middle income countries (LMICs), yet the causal pathways through which it shifts provider behaviour, and the systemic conditions that enable or constrain those pathways, remain under theorised. Using a theory based qualitative evaluation, we examined how and why a practice coaching intervention influenced QI in cervical cancer screening (CCS) and antenatal care (ANC) within Honduras decentralised PHC system during the third phase of the Salud Mesoamerica Initiative (SMI). Methods. We conducted a within case explanatory case study. A programme theory was reconstructed before data collection and iteratively refined against evidence. Data comprised semi structured interviews with 11 midlevel managers, 6 PHC team medical leads, and 2 regional managers, complemented by direct observation and document review. We applied combined deductive and inductive coding, thematic analysis, and pattern matching, and reporting per COREQ. Results. We identified four causal patterns that refined the initial programme theory. Three were activated pathways: (1) novel professional identity among participating managers; (2) collective efficacy and data driven learning, sustained through verifiable progress on observable indicators, strong for CCS but null for ANC, where outcomes were less attributable to teams actions; and (3) relational coordination, psychological safety, and trust, which provided the interpersonal basis for the first two. A fourth, unanticipated pattern showed structural misalignment between coaching enabling, learning based logic and the directive, punitive logic of Honduras performance based contracting environment, confining gains to localised enabling bubbles. Conclusion. Coaching can activate meaningful QI pathways in LMIC primary care, but sustained, equitable impact requires deliberate alignment between coaching learning oriented principles and the institutional performance management architecture, and matching of coaching investment to clinical processes with observable, attributable outcomes.

8

One-year within-trial and lifetime-horizon modeled health economic evaluation of the risk-stratified Prediabetes Lifestyle Intervention Study (PLIS) for prediabetes remission in Germany

Mohebbi, D.; Vomhof, M.; Montalbo, J.; Winkels, A. K.; Gontscharuk, V.; Chernyak, N.; Dintsios, C.-M.; Kairies-Schwarz, N.; Stark, R.; Emmert-Fees, K. M. F.; Fan, M.; Schick, R.; Schürmann, A.; Bornstein, S.; Heni, M.; Stefan, N.; Jumpertz von Schwartzenberg, R.; Blüher, M.; Lechner, A.; Clavel, J.; Kopf, S.; Szendrödi, J.; Roden, M.; Wagner, R.; Fritsche, A.; Birkenfeld, A. L.; Icks, A.

2026-05-26 health economics 10.64898/2026.05.22.26353768 medRxiv

Top 0.8%

0.2%

Show abstract

Background Lifestyle interventions can increase the probability of remission of prediabetes to normal glucose tolerance, but their economic value remains unclear. We assessed the within-trial and lifetime-horizon modeled cost-effectiveness of intensive and conventional lifestyle interventions in risk-stratified participants with prediabetes. Methods A health economic evaluation was conducted alongside the 12-month multicenter PLIS trial (n=1,105). High-risk participants were randomized to intensive (HR-INT) or conventional (HR-CONV); low-risk participants to conventional lifestyle intervention (LR-CONV) or control (only short single consultation; LR-CTRL) with risk stratification based on insulin secretion, insulin sensitivity, and liver fat content. Within-trial analyses estimated incremental costs per additional remission to normoglycemia and per quality-adjusted life year (QALY). Lifetime cost-effectiveness was modelled using a four-state Markov Model. Findings At 12 months, HR-INT and LR-CONV increased remission compared with their respective comparators. The incremental cost per additional remission was {euro}7,081 (95% CI: dominated-47,277) for HR-INT and {euro}4,278 (1,312-11,793) for LR-CONV from a health insurance perspective. A willingness-to-pay of {euro}22,000 (HR-INT) and {euro}7,500 (LR-CONV) per additional remission corresponded to 90% probability of cost-effectiveness. Neither intervention was cost-effective in terms of QALYs gained within the 12-months period. Lifetime modelling suggested that both HR-INT and LR-CONV are not only cost-effective, but also cost-saving, relative to HR-CONV and LR-CTRL, respectively. Also in the probabilistic sensitivity analysis, most simulations indicated dominance (71.7% for HR and 88% for LR). Interpretation Based on short-term economic evaluation, the interventions assessed were cost-effective regarding additional participants with remission, not for incremental QALYs gained. Lifetime modelling suggests cost savings for both risk groups. Targeting populations with lifestyle interventions to achieve prediabetes remission seems to generate good value for money in the long term.

9

Can Large Language Models Diagnose Primary Immunodeficiency from Patient-Described Symptoms?

Reteig, L. C.; Woloshin, S.; Maglione, P. J.; Farmer, J. R.; Ong, M.-S.

2026-05-27 allergy and immunology 10.64898/2026.05.26.26353818 medRxiv

Top 0.9%

0.2%

Show abstract

Patients with primary immunodeficiency (PID) often face prolonged diagnostic delays and may increasingly turn to large language models (LLMs) to interpret their symptoms during this period. We evaluated whether an LLM could recognize PID from symptom descriptions derived from interviews with 21 PID patients. In a prior study, we showed that GPT-4o identified PID in 96% of cases when prompted with physician-written patient histories (Rider et al., JACI, 2024). Here, when prompted with symptom descriptions in patients' own words, GPT-5 identified PID in only 7 cases (33%), although it more broadly suggested immune system issues in 18 cases (81%). The gap between these findings indicates that LLMs are sensitive to the language and framing of symptom descriptions, performing substantially worse when patients describe their own symptoms in everyday language than when clinicians summarize patient histories in structured medical terms. This study underscores the need to carefully evaluate how LLMs are used in patient-facing applications.

10

Grounding Language Models in Behavioral Science to Scale Physical Activity Interventions for Hispanic/Latinx Populations

Mantena, S. D.; Johnson, A.; Schuetz, N.; Tolas, A.; Montalvo, S.; Delgado-SanMartin, J.; Ramirez Posada, M.; Du, L.; Zhang, S.; Huynh, A. D.; Oppezzo, M.; King, A. C.; Schmiedmayer, P.; Lawrie, A.; Rodriguez, F.; Ashley, E.; Kim, D. S.

2026-05-28 cardiovascular medicine 10.64898/2026.05.26.26354165 medRxiv

Top 1.0%

0.2%

Show abstract

Objective: Hispanic/Latinx populations in the U.S. experience higher rates of chronic disease linked to physical inactivity, yet digital health interventions remain largely inaccessible to more than 16 million Hispanic/Latinx adults with limited English proficiency. While large language models (LLMs) offer scalable personalization, their use in non-English behavioral coaching is unexplored. This study introduces MHC-Coach-ES, a Spanish-language LLM fine-tuned on the Transtheoretical Model (TTM) of behavior change. Materials and Methods: We fine-tuned Llama 3-70B-Instruct using a two-stage pipeline. First, the model was adapted to Spanish health and motivational language using a 2.21-million-token corpus. Second, it was instruction-tuned on 3,268 translated human written messages to align the model with the Transtheoretical Model (TTM) of Behavioral Change. We compared MHC-Coach-ES with Llama 3-70B-Instruct and translated human-expert messages using a forced-choice preference survey (N = 77) and blinded expert review (N = 2). Results: Spanish-speaking participants significantly preferred MHC-Coach-ES messages over translated human-expert messages (81% preference, P<0.001). Linguistic analysis showed that MHC-Coach-ES produced more temporally anchored messages than the base model (65% vs. 20%), while maintaining readability. In blinded evaluation, clinical experts rated MHC-Coach-ES higher for alignment with Transtheoretical Model stages than human-expert messages (4.83 vs. 4.38 out of 5). The base model also outperformed translated expert messages across preference and expert ratings. Conclusions: Generative AI can operationalize behavioral science frameworks in Spanish, offering a scalable approach to reducing health disparities. The strong performance of both MHC-Coach-ES and the base model highlights the promise of generative and personalized approaches over translation-based localization for theory-driven behavioral interventions.

11

Establishing a framework for human dose prediction in anti-tuberculosis drug development

Patel, A.; Li, A. T.; Solans, B.; Savic, R.

2026-05-28 infectious diseases 10.64898/2026.05.26.26354063 medRxiv

Top 1.0%

0.2%

Show abstract

Rationale: Efficacious dose selection for anti-tuberculosis drugs has traditionally relied on achieving plasma exposures above the minimum inhibitory concentration, but this approach has not consistently aligned with clinical outcomes. Objectives: We sought to identify early pharmacokinetic-pharmacodynamic targets most predictive of clinical efficacious dose. Methods: We conducted a back-translational, pharmacokinetic-pharmacodynamic simulation-based analysis of 15 anti-tuberculosis drugs. Using pharmacokinetic data from multiple biological matrices and a range of pharmacodynamic metrics, we established candidate exposure-response targets for attainment. We systematically evaluated the predictive accuracy of each target pair against established clinical doses to formulate a decision-making framework linking key drug properties to the most predictive targets. Measurements and Main Results: Depending on the target used, projected clinical doses varied widely - both within and across compounds - highlighting the importance of target selection for dose projection and go/no-go decisions. In general, targeting cellular lesion-level drug exposures relative to in vivo preclinical potency provided an effective approach for early dose selection. However, for highly penetrating drugs, targeting site-of-action therapeutic exposures in the caseum was more predictive of clinical dose. Based on these findings, we developed a preliminary dose prediction tool that enables drug developers to estimate clinically relevant dose ranges of compounds using in vitro and early in vivo data. Conclusions: This work establishes and validates a simple, evidence-based framework to standardize early translational decision-making on dose selection of anti-tuberculosis candidates in development.

12

Estimating cost of integrating HBV, HCV, and HIV screening at ANC using Time-Driven Activity Based Costing Approach; A providers perspective comparing Intervention and standard of care at lower health facilities in West Nile sub region, Uganda

Alege, J. B.; Oyore, J. P.; Nanyonga, R. C.; Ssebagereka, A.; Ssempala, R.; Musoke, P.; Orago, A. S. S.

2026-05-26 health economics 10.64898/2026.05.20.26353753 medRxiv

Top 1%

0.1%

Show abstract

Abstract Objective To Estimate cost of integrating HBV, HCV, and HIV screening at Antenatal using Time-Driven Activity Based Costing (TDACB) Approach; A providers perspective comparing Intervention and standard of care at lower health facilities in West Nile sub region, Uganda Methods Design The Time Driven Activity-Based Costing (TDABC) approach was used to capture resource use and costs associated with delivering integrated HBV, HCV, and HIV screening among pregnant women. This study compared screening uptake among study participants in the intervention, and control group respectively. Five lower health facilities in Koboko and Maracha districts respectively in West Nile region of Uganda. A total of 1,338 study participants wo were pregnant mothers in first ANC, first trimester at the selected 10 facilities were enrolled in this study. Data were abstracted, and also collected on; Personnel/staff time; facility space utilisation; and Medical and non-medical equipment. Total cost per patient visit=Staff time costs+Space cost Equipment cost. Outcome Measure was the estimated provider-perspective costs of delivering integrated screening for HBV, HCV and HIV, using Integrated Care Model by comparing intervention and control groups. Results Staff CCRs demonstrated considerable variability across cadres and facilities, with an overall mean of USD 0.492 per minute (Range: USD 0.167 - 1.318). Laboratory technicians exhibited the highest mean CCR at USD 0.767 per minute for personnel CCRs per patient visit. the mean lowest CPP visit was noted for HBV in the intervention arm (USD 11.43) while HIV test was the lowest in the control arm (USD 0.43). HCV test had the highest cost in the control arm (USD 0.52). The CPP visit for positive clients were generally higher than those that were negative. Equipment CCRs were minimal and highly consistent across facilities, with a mean of USD 0.00069 per minute ({+/-}0.0002). HIV/Syphilis combo was the costliest test kits at USD 3.14 per test kit followed by viral hepatitis C test kit and Hep B at USD 2.47 and USD 0.28 respectively. Facility space CCRs exhibited moderate variation across facilities, ranging from USD 0.01593 to USD 0.03474 per minute. Overall mean CCR for the space for delivering HBV, HCV or HIV testing was USD 0.0256 (0.0066). Conclusion; Overall, the integration of screening resulted in: Cost efficiencies where the same staff and space were used for multiple simultaneous tests, reduced marginal costs for HIV tests due to larger procurement volumes, and higher marginal cost additions for HBV and HCV due to pricier reagents.

13

Data Assimilation Substitutes for Biological Complexity in Hybrid Influenza Forecasting Models

Alleman, T. W.; Van Wesemael, T.; Shanker, N.; Mietchen, M. S.; Loo, S.; Ajagbe, S. O.; Baetens, J. M.; Lemaitre, J.; Hill, A. L.; Truelove, S. A.; Bento, A. I.

2026-05-27 public and global health 10.64898/2026.05.19.26353597 medRxiv

Top 2%

0.1%

Show abstract

Hybrid mechanistic-statistical models offer interpretability and adaptability for short-term seasonal epidemic forecasting, but it remains unclear whether their accuracy depends more on increased biological complexity or on the assimilation of richer data. Using eight retrospective influenza seasons in North Carolina, we evaluate whether training on historical data and assimilating auxiliary emergency department (ED) visit data improves four-week-ahead hospital admission forecasts more than adding biological complexity (multi-subtype structure and cross-season immunity). Hierarchical Bayesian training on historical data improves accuracy by 22.4 % (95 % CI: 16.4-28.1 %), and inclusion of ED visit data yields a further 5.3 % (95 % CI: 3.0-7.6 %) improvement, whereas added biological complexity produces diminishing or null gains. We further observe a substitution effect in which ED visit data partially compensates for omitted biological structure. We deployed a simplified model variant in the 2025-2026 CDC FluSight Challenge and ranked among the top ensemble performers, supporting the robustness of Bayesian hierarchical training in real time. Together, these findings indicate that short-term forecast accuracy is driven more by historical learning and assimilating auxiliary signals than by biological fidelity, with implications for how forecasting systems should balance mechanistic complexity.

14

Impact of the Management Development Programme (MDP) on primary health care manager competencies and organisational Performance

Sineke, T.; Shumba, K.; Moolla, A.; Mongwenyana-Makhutle, C.; Hongoro, D.; Miot, J.; Kruger, P.; Graven, J.; Onoya, D.

2026-06-01 health systems and quality improvement 10.64898/2026.05.28.26354357 medRxiv

Top 2%

0.1%

Show abstract

Primary healthcare (PHC) managers are central to the functioning of South Africas healthcare system, yet many assume leadership roles without formal management training. To address this gap, the Aurum Institute developed the Management Development Programme (MDP), a structured leadership and management training intervention aimed at strengthening PHC management competencies. This study evaluated the impact of the MDP on leadership practices, organisational readiness for change, and workplace stress among PHC managers in the Western Cape Province. A non-randomised matched cluster trial was conducted across 20 PHC facilities. Intervention facilities were purposively selected based on participation in the MDP, while matched control facilities were randomly selected. Data were collected using structured and semi-structured surveys administered to facility managers and clinic staff. Leadership competency was assessed using the Leadership Practices Inventory (LPI), which measures five dimensions of exemplary leadership: Model the Way, Inspire a Shared Vision, Challenge the Process, Enable Others to Act, and Encourage the Heart. Organisational readiness for change was measured using Kotters 8-Step Framework, while workplace stress was assessed using a 13-item version of the Brief Job Stress Questionnaire focusing on Job Meaning, Environmental Quality, Autonomy, and Control. Intervention effects were estimated using generalised linear models adjusted for manager age, years in role, matched-pair fixed effects, and cluster-robust standard errors. Outcomes were reported as adjusted risk differences with 95% confidence intervals and two-sided p-values. A total of 20 facility managers (median age 51 years; IQR 42-55; 90% female) and 105 clinic staff members (median age 42 years; IQR 35-50) participated in the study. Managers in both intervention and control facilities reported consistently high self-rated leadership competency scores across all LPI domains, with no statistically significant differences between groups. Similarly, clinic staff rated managers highly across the standard LPI domains, and no significant differences were observed between intervention and control facilities. Despite the absence of significant differences in overall leadership competency scores, staff in intervention facilities reported significantly stronger relational and communication practices among managers compared with staff in control facilities (72.7% vs. 64.0%; adjusted risk difference 22.0%, 95% CI 6.1-37.8; p=.007). After adjustment for age and tenure imbalances, intervention facilities also demonstrated significantly higher scores for institutionalised capability and learning culture (adjusted risk difference 21.3%, 95% CI 0.6-42.0; p=.043). Managers who participated in the MDP further reported stronger perceptions of district support, including improved internal leadership and cultural readiness (adjusted risk difference 22.1%, 95% CI 14.0-30.3; p<.001) and greater district leadership and resource availability (adjusted risk difference 28.1%, 95% CI 15.6-40.6; p<.001). No statistically significant differences were observed in workplace stress across any domain. Although the MDP did not produce measurable short-term improvements in managers self-rated leadership competencies or standard LPI domains as assessed by staff, it was associated with important gains in relational leadership practices, organisational readiness for change, and perceived district support. These findings suggest that structured management training programmes may strengthen critical organisational and interpersonal foundations necessary for sustained performance improvement within PHC settings.

15

Heterogeneity in susceptibility among humans to common respiratory viral infections

Shinozaki, K.; Miura, F.

2026-06-01 infectious diseases 10.64898/2026.05.29.26353692 medRxiv

Top 2%

0.1%

Show abstract

Background Human challenge trials provide a unique opportunity to quantify pathogen infectivity in terms of the probability of infection given an inoculated dose. However, between-pathogen comparisons are often distorted by individual heterogeneity in host susceptibility and by differences in background immunity across trial populations. We examined how dose-dependent infection risks differ across common respiratory viruses when such heterogeneity is explicitly incorporated. Methods We conducted a systematic review of human challenge trials for four respiratory viruses: respiratory syncytial virus (RSV), influenza virus, rhinovirus, and adenovirus. Using the extracted data, we fitted dose-response models under different distributional assumptions, allowing both continuous susceptibility variation and discrete immune fractions. We compared alternative heterogeneity models and evaluated pathogen-specific dose-response patterns using original and scaled dose metrics. Results All four viruses showed substantial heterogeneity in host susceptibility, and models assuming homogeneous susceptibility were unsupported. RSV and influenza were best described by models with a distinct immune or effectively non-susceptible subgroup, and the estimated immune proportions were approximately 40% and 25%, respectively. In contrast, rhinovirus and adenovirus were better explained by continuously distributed susceptibility, with little evidence of a fully immune subgroup. On a scaled dose axis, rhinovirus and adenovirus showed steeper increases in infection risk with dose than RSV and influenza. Conclusions The structure of susceptibility heterogeneity differs across common respiratory viruses, which in turn shapes dose-dependent infection risks. Incorporating this heterogeneity is essential for valid cross-pathogen comparison and for interpreting human challenge data in epidemiologic and public health contexts.

16

Integrating vaccination with short-term behavioral guidance enables mpox outbreak control

Maniscalco, D.; Robineau, O.; Boelle, P.-Y.; Mailles, A.; Noel, H.; Tarantola, A.; Velter, A.; Colizza, V.

2026-05-28 infectious diseases 10.64898/2026.05.26.26354088 medRxiv

Top 2%

0.1%

Show abstract

Background. Despite the decline of the 2022 global outbreak, mpox remains an ongoing public health concern, with persistent transmission and emerging viral clades sustaining resurgence risk. Improving preparedness and response is a priority, yet it remains unclear how best pre-exposure vaccination and community response can effectively limit transmission under realistic conditions and whether behavioral adaptation is critical. Methods. We used a data-driven network model of mpox transmission among men who have sex with men in the Paris region, parameterized with sexual behavioral data and calibrated to surveillance data from the 2022 outbreak. We evaluated counterfactual scenarios by varying vaccination timing, rollout speed, prioritization, and behavioral responses. Results. Here we show that, with respect to the 2022 epidemic in the Paris region, vaccination alone delivered at the observed rollout speed would not have reproduced the observed epidemic decline, even if initiated the day of the first European alert, corresponding to 12 days before the first case was reported in France. Achieving comparable control through vaccination alone would have required more than a fourfold increase in rollout speed. Large-scale and long-term reductions in sexual contacts remain instrumental to limit the epidemic size, although earlier vaccination reduces the proportion of MSM needing to change behavior. In contrast, short-term behavioral measures adopted by the vaccinees, such as sexual abstinence during the 14-day immunity-building period, combined with moderately faster vaccine rollout, (+68% for 50% compliance; +34% for 75% compliance) could achieve comparable epidemic control. Targeting individuals with higher sexual activity further improved intervention efficiency. Conclusions. Under realistic reactive vaccination scenarios, mpox control still requires strong behavioral responses. Combining timely vaccination with short-term behavioral change guidance at vaccine administration offers a feasible path to limit transmission and strengthen outbreak preparedness and response.

17

Changes in Frequency of Resuscitation Among the Oldest Old Following Japans End-of-Life Care Guideline Revision: A Population-Level Interrupted Time-Series Analysis Using National Open Claims Data

Sakai, M.; Nakayama, T.

2026-05-30 health policy 10.64898/2026.05.28.26354307 medRxiv

Top 2%

0.0%

Show abstract

Resuscitation in the oldest old at the end of life is associated with potential harm, raising concerns about misalignment with patients goals of care. This study aimed to elucidate changes in the use of resuscitation among the oldest old in Japan following the revision of the national guideline on end-of-life care which explicitly incorporates the concept of advance care planning. We conducted a repeated cross-sectional study using the National Database of Health Insurance Claims Open Data, including adults aged [≥]85 years, from April 2014 to March 2024. The annual number of resuscitation procedures per 100,000 individuals aged [≥]85 years was used as the measure of frequency. Resuscitation included closed-chest cardiopulmonary resuscitation (CPR) and endotracheal intubation. Interrupted time series analysis was used to examine changes following the 2018 revision of the national end-of-life care guideline. The frequencies of CPR and endotracheal intubation declined before 2018 (CPR: age 85-89, -68.4 [-87.9 to -48.8]; age [≥]90, -106.7 [-131.5 to -82.0]; intubation: age 85-89, -57.5 [-71.8 to -43.2]; age [≥]90, -69.5 [-80.7 to -58.3]), but the decline attenuated thereafter (CPR: age 85-89, +56.2 [28.0 to 84.5]; age [≥]90, +84.1 [50.7 to 117.6]; intubation: age 85-89, +36.6 [8.5 to 64.7]; age [≥]90, +38.3 [23.8 to 52.8]). These findings provide insight into the changes in resuscitation trends following policy interventions supporting end-of-life decision-making. Further studies are needed to better understand the mechanisms underlying this change.

18

Case-level artificial intelligence for multi-photo teledermatology submissions: development and internal validation using patient-submitted dermatology images

Patel, V. P.; Sheth, N.; Patel, A.; Patel, Y.

2026-06-01 dermatology 10.64898/2026.05.21.26353816 medRxiv

Top 2%

0.0%

Show abstract

Background: Store-and-forward teledermatology commonly relies on several patient-submitted photographs of the same concern, but most dermatology artificial intelligence models classify single images independently. Objective: To develop and internally validate a case-level diagnostic-support model that aggregates multiple patient-submitted photographs for common dermatologic conditions. Methods: We conducted a retrospective diagnostic-modeling study using the Skin Condition Image Network, a public dataset of deidentified self-taken dermatology images from US adults. We curated 2,336 cases comprising 5,041 images across 10 common inflammatory, allergic, and infectious conditions. Cases were split at the submission level into training, validation, and held-out test sets. Frozen general-purpose and dermatology-specific encoders were compared with image-level classifiers and a gated-attention multiple instance learning model that generated one case-level output from 1-3 images. Results: The strongest image-level baseline, dermatology-specific embeddings with random forest classification, achieved macro/micro ROC-AUCs of 0.797/0.854. Case-level aggregation improved discrimination, with dermatology-specific embeddings plus multiple instance learning achieving mean macro/micro ROC-AUCs of 0.819/0.863 across repeated stratified experiments. The locked final model achieved macro/micro ROC-AUCs of 0.800/0.849 on the held-out test set. Balanced-threshold sensitivity/specificity examples were 0.702/0.688 for eczema and 0.818/0.826 for urticaria. Limitations: Internal validation used a 10-condition subset from a US volunteer dataset; external validation, calibration, subgroup performance analysis, and prospective workflow studies are required. Conclusion: Modeling the teledermatology submission as a multi-image case better reflects asynchronous dermatology workflow than single-image classification. The model is preliminary clinician-facing support for structured review and triage, not autonomous diagnosis.

19

Fisher information matrix computation for joint longitudinal and survival models to support clinical study design and covariate effect assessment

Fayette, L.; Brendel, K.; Mentre, F.

2026-06-01 pharmacology and therapeutics 10.64898/2026.05.28.26354340 medRxiv

Top 2%

0.0%

Show abstract

Joint modelling of longitudinal data using non-linear mixed effects models and time-to-event outcomes provides a suitable framework to account for informative censoring when estimating biomarker dynamics and quantifying event risk using covariates and longitudinal trajectories. Their usefulness in clinical research depends on data collection design, particularly to precisely estimate the association (link) parameter between longitudinal and survival processes. However, optimal design strategies have so far been addressed separately for longitudinal and survival endpoints and remain unexplored for joint models. We propose two Fisher Information Matrix (FIM) computation methods for joint models, relying on Monte-Carlo integration over observations combined with either Markov Chains Monte-Carlo or Adaptive Gaussian Quadrature to integrate random effects. Their accuracy is assessed against clinical trial simulations in an oncological example based on the HORIZON III study with a tumour-growth-survival model including discrete and continuous covariates. We apply these methods to quantify the impact of follow-up duration, sampling richness, sample size, and covariate distribution on parameter uncertainty and test power. In our example, longitudinal-parameter uncertainty is barely affected by follow-up duration or sampling richness, whereas survival-parameter uncertainty decreases substantially from 1-year to 2-year follow-up. The number of subjects needed (NSN) to achieve <15\% uncertainty on the link parameter is comparable for a 2-year rich design and a 3-year sparse design. Optimal covariate distributions are stable across designs and systematically improve test power, outperforming longer and richer but non-optimised designs. These FIM-based methods accurately predict uncertainty and test powers, enabling design evaluation and NSN computation for joint-model-based clinical studies.

20

Quantifying the Optimism of Naive Cross-Validation for Binary Outcome Prediction with Repeated-Measures Predictors: A Simulation Study and Clinical Illustration

Hagan, J.

2026-05-29 epidemiology 10.64898/2026.05.27.26354222 medRxiv

Top 2%

0.0%

Show abstract

Background. Cross-validation (CV) is widely used to estimate predictive performance, but can overestimate performance when applied at the observation level to repeated-measures data. When continuous predictor variables are measured repeatedly within subjects and the binary outcome is defined at the subject level, naive observation-level CV introduces data leakage through within-subject dependence, producing optimistically biased estimates of the area under the receiver operating characteristic curve (AUROC). The magnitude of this bias and the performance of alternative partitioning strategies have not been formally characterized for this data structure. Methods. Three CV strategies were compared for estimating subject-level AUROC in ridge logistic regression models: naive observation-level 10-fold CV, subject-level 10-fold CV, and leave-one-cluster-out (LOCO) CV. The framework was applied to a motivating clinical dataset of daily oxygenation measures and retinopathy of prematurity outcomes among 101 extremely low birth weight infants. A factorial simulation study was conducted across 162 parameter combinations varying cluster count (20-150), intraclass correlation (0.1-0.5), within-cluster autocorrelation (0.2-0.8), and outcome prevalence (10-35%), with 500 simulated datasets per condition (76,389 valid datasets total). Results. In the motivating dataset, naive CV produced optimism of +0.078 AUROC units for severe ROP prediction (15 events, 101 subjects) and +0.031 for any ROP prediction (48 events). Subject-level 10-fold CV closely approximated LOCO (deviation [≤] 0.015). In the simulation, naive CV optimism ranged from +0.039 to +0.204 across all conditions, increasing monotonically with higher ICC, higher autocorrelation, fewer clusters, and lower event rates. Subject-level 10-fold CV was essentially unbiased relative to LOCO across all 162 conditions (mean absolute deviation = 0.002). Conclusions. Naive observation-level CV meaningfully overestimates discriminative performance in the repeated-measures binary outcome setting and should not be used. Subject-level CV partitioning effectively eliminates this bias. Accordingly, subject-level partitioning should be considered essential, not optional, when validating prediction models using repeated-measures data with subject-level outcomes.